Skip to content

Conversation

@exur00
Copy link
Contributor

@exur00 exur00 commented Oct 28, 2025

Fixed solution that was merged in #181 but was reverted because it caused issues. It has been built and tested on the jenkins without issues.

@exur00 exur00 requested a review from MichielDerhaeg October 30, 2025 10:53
@exur00 exur00 self-assigned this Oct 30, 2025
Copy link
Contributor

@MichielDerhaeg MichielDerhaeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks fine. Let's improve the commit message a bit. Help the reviewer understand what you did.

If the target has no barrel shifter, clobber is now emitted in a parallel RTL expression. si3_loop matches the parallel expression and emits the necessary expressions that may or may not clobber the CC. This happens in split1, before registers are assigned and dead stores are removed. If the clobber was unnecessary, its consequences will be removed by those steps.

A bit too verbose IMO. Just describe the problem we're solving:

Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were previously added only after split1, which is too late. This patch adds these clobbers from the beginning, in the define_expand.

Explain succinctly the changes you did, why did you merge _nobs and _loop?

si3_cnt1_clobber is placed below all the specific si3_cnt1 patterns.
It matches the si3_loop pattern in the case operand2 has become
const_int 1, and remove the clobber so the specific si3_cnt1 patterns
can be matched.
If si3_cnt1_clobber is placed above these patterns, some si3_cnt
would try to emit extra clobbers to match it instead of their more
specific pattern that match exactly. This would cause clobbers of hard
registers, causing failure.

This is a bit hard to parse. But I don't think you need to mention this. I think the comment you wrote in the code is sufficient.

@luismgsilva
Copy link
Member

Implementation looks fine. Let's improve the commit message a bit. Help the reviewer understand what you did.

If the target has no barrel shifter, clobber is now emitted in a parallel RTL expression. si3_loop matches the parallel expression and emits the necessary expressions that may or may not clobber the CC. This happens in split1, before registers are assigned and dead stores are removed. If the clobber was unnecessary, its consequences will be removed by those steps.

A bit too verbose IMO. Just describe the problem we're solving:

Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were previously added only after split1, which is too late. This patch adds these clobbers from the beginning, in the define_expand.

Explain succinctly the changes you did, why did you merge _nobs and _loop?

si3_cnt1_clobber is placed below all the specific si3_cnt1 patterns.
It matches the si3_loop pattern in the case operand2 has become
const_int 1, and remove the clobber so the specific si3_cnt1 patterns
can be matched.
If si3_cnt1_clobber is placed above these patterns, some si3_cnt
would try to emit extra clobbers to match it instead of their more
specific pattern that match exactly. This would cause clobbers of hard
registers, causing failure.

This is a bit hard to parse. But I don't think you need to mention this. I think the comment you wrote in the code is sufficient.

This, and perhaps a before and after assembly example might help.

@exur00 exur00 force-pushed the loeka/no-barrel-shifter branch 3 times, most recently from a035cf1 to b25c113 Compare November 3, 2025 11:52
Address the issue explained here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120375

Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were
previously added only after split1, which is too late. This patch adds
these clobbers from the beginning, in the define_expand.

Previously, define_insn_and_split *<insn>si3_nobs would match any shift or
rotate instruction and would generate the necessary patterns to emulate a
barrel shifter, but it did not have any output assembly for itself.
In many cases this would create a loop with parallel clobbers. This pattern
is then matched by the <insn>si3_loop pattern.

In the no-barrel-shifter.c test tree code:

;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
_2 = x.0_1 >> 31;

in the expand pass becomes the following pattern that matches *lshrsi3_nobs:

(insn 18 17 19 4 (set (reg:SI 153 [ _2 ])
        (lshiftrt:SI (reg/v:SI 156 [ x ])
            (const_int 31 [0x1f]))) "test2.c":9:24 -1
     (nil))

This pattern misses the necessary clobbers and remains untouched until the
split1 pass. Together with the later branch it becomes

;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
	add.f	0,r0,r0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	beq.d	.L8
;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
	rlc	r0,0

Leading to an issue: the add.f instructions overwrites CC but beq expects
CC to contain an earlier value indicating mag == 0x7f800000.

Now, these are combined in define_insn_and_split <insn>si3_loop that is
explicitly emitted in the define_expand and already contains the clobbers.
This can then be split into another pattern or remain the loop pattern.

In the expand pass, the same example now becomes:

(insn 18 17 19 4 (parallel [
            (set (reg:SI 153 [ _2 ])
                (lshiftrt:SI (reg/v:SI 156 [ x ])
                    (const_int 31 [0x1f])))
            (clobber (reg:SI 60 lp_count))
            (clobber (reg:CC 61 cc))
        ]) "test2.c":9:24 -1
     (nil))

Because the correct clobbers are now taken into account, the branch condition
is reevaluated by using breq instead of br.

;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
	add.f	0,r0,r0
	rlc	r0,0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	breq	r2,2139095040,.L8

Regtested for arc.

Co-authored-by: Keith Packard <[email protected]>
Signed-off-by: Loeka Rogge <[email protected]>
@exur00 exur00 force-pushed the loeka/no-barrel-shifter branch from b25c113 to 3f600ca Compare November 3, 2025 11:55
@exur00
Copy link
Contributor Author

exur00 commented Nov 3, 2025

I rewrote the commit with a better explanation and some RTL and assembly from the test. I also added keithp (who reported the bug) as co-author.

Copy link
Member

@luismgsilva luismgsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great commit message.

@exur00 exur00 merged commit bcbd1fa into arc-2026.03 Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants